Online Self-Organised Map Classifiers as Text Filters for Spam Email Detection

نویسندگان

  • Bogdan Vrusias
  • Ian Golledge
چکیده

Email communication today is a way of working and communicating for most businesses and public in general. Being able to efficiently receive and send emails therefore becomes a must. Spam email detection and removal then becomes a vital process for the successful email communications, security and convenience. This paper describes a novel way of analysing and filtering incoming emails based on the text (keyword) salient features identified within. The method presented has promising results and at the same time significantly better performance than other statistical and probabilistic methods and at the same time offers a mechanism that can automatically adapt to new (unseen) email trends. The salient features of emails are selected automatically based on functions combining word frequency and other discriminating matrices, and then encoded into appropriate numerical vector models. The method is compared against the state-of-the-art Multinomial Naïve Bayes, Support Vector Machines and Boosted Decision Tress classifiers for identifying spam. The proposed automatic adaptable feature extractor method and online Self-Organising Map seems to give significantly better results, with the minimal cost.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptable Text Filters and Unsupervised Neural Classifiers for Spam Detection

Spam detection has become a necessity for successful email communications, security and convenience. This paper describes a learning process where the text of incoming emails is analysed and filtered based on the salient features identified. The method described has promising results and at the same time significantly better performance than other statistical and probabilistic methods. The sali...

متن کامل

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

متن کامل

Adaptive Spam Filtering Using Only Naive Bayes Text Classifiers

In the past few years, machine learning and in particular simple Naive Bayes classifiers have proven their value in filtering spam emails. We hereby put Naive Bayes filters to the test, against potentially more elaborate spam filters that will participate in the ceas 2008 challenge. For this purpose, we use the variants of Naive Bayes that have proven more effective in our earlier studies. Furt...

متن کامل

Stacking Classifiers for Anti-Spam Filtering of E-Mail

We evaluate empirically a scheme for combining classifiers, known as stacked generalization, in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial email, or “spam”, floods mailboxes, causing frustration, wasting bandwidth, and exposing minors to unsuitable content. Using a public corpus, we show that stacking can improve the eff...

متن کامل

A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009